Introducing spatial data

Spatial data types

  • Vector data
  • represents space using points, lines and polygons (“features”)
  • each of these can have additional data (“attributes”)
  • R package sf

  • Raster data
  • represents space using cells of equal size
  • each cell can have additional data (“layers”)
  • R package raster

Vector data

Vector data

Raster data

Raster data

Simple features

What is a feature?

  • A feature is a thing… a city, a house, a tree
  • has a geometry describing where it is located
  • has attributes describing its other properties

  • A feature may be made up of other features
  • pixel vs. image

What are simple features?

  • open standard for spatial data types

  • points, lines and polygons = “geometries”

  • geometries are hierarchical
  • set of points is a line
  • set of lines forms a polygon
  • several polygons is a multipolygon

  • 17 geometries, of which we look at 7

  • “data frames with a spatial extension”

sf geometry types

  • A point: st_point()
  • A linestring: st_linestring()
  • A polygon: st_polygon()
  • A multipoint: st_multipoint()
  • A multilinestring: st_multilinestring()
  • A multipolygon: st_multipolygon()
  • A geometry collection: st_geometrycollection()

  • Can be created from vectors, matrices, or lists

Features made of single geometries

# POINT
(pt1 <- st_point(c(5, 2)))

# LINESTRING
(ls1 <- st_linestring(rbind(c(5, 2), c(1, 3), c(3, 4))))

# POLYGON
pl <- list(rbind(c(5, 2), c(1, 3), c(3, 4), c(5, 2)))
(poly1 <- st_polygon(pl))

Features made of multiple geometries

# MULTIPOINT
mpmat <- rbind(c(5, 2), c(1, 3), c(3, 4))
(mp1 <- st_multipoint(mpmat))

# MULTILINESTRING
mll <- list(rbind(c(1, 5), c(4, 4), c(4, 1), c(2, 2)), rbind(c(1, 
    2), c(2, 4)))
(mls1 <- st_multilinestring((mll)))

# MULTIPOLYGON
mpyl <- list(list(rbind(c(1, 5), c(2, 2), c(4, 1), c(1, 5))), 
    list(rbind(c(0, 2), c(1, 2), c(1, 3), c(0, 3), c(0, 2))))
(mpoly1 <- st_multipolygon(mpyl))

Combination of geometries

# GEOMETRYCOLLECTION
gcl <- list(ls1, pt1, mpoly1)
(geomcol <- st_geometrycollection(gcl))

Building up spatial data frames in R

From sfg to sfc

  • Objects we’ve made so far have class sfg
class(pt1)
## [1] "XY"    "POINT" "sfg"
class(mp1)
## [1] "XY"         "MULTIPOINT" "sfg"
  • A simple feature geometry column (sfc) is a list of sfg objects
  • Important because usually we have more than one feature

Combine simple features with st_sfc()

# sfc POINT
pt1 <- st_point(c(5, 2))
pt2 <- st_point(c(1, 3))
(points_sfc <- st_sfc(pt1, pt2))
## Geometry set for 2 features 
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1 ymin: 2 xmax: 5 ymax: 3
## epsg (SRID):    NA
## proj4string:    NA

class(points_sfc)
## [1] "sfc_POINT" "sfc"
is.list(points_sfc)  # is this a list?
## [1] TRUE
is.list(mp1)  # is this (multipoint) a list?
## [1] FALSE
st_geometry_type(points_sfc)
## [1] POINT POINT
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT
... TRIANGLE

From sfc to sf

  • sfg and sfc only tell you about locations

  • to these we add non-geographic attributes associated with each feature

  • an sf object is a data frame combines these two things:
  • attributes are stored in a regular data.frame
  • location information (co-ordinates) is stored in one (simple feature geometry) column (sfc) of this data frame

  • Simple features = data frames with spatial attributes stored in a list-column

lnd_point <- st_point(c(0.1, 51.5))                 # sfg object
lnd_geom <- st_sfc(lnd_point, crs = 4326)           # sfc object
lnd_attrib <- data.frame(                           # data.frame object
  name = "London",
  temperature = 25,
  date = as.Date("2017-06-21")
)
lnd_sf <- st_sf(lnd_attrib, geometry = lnd_geom)    # sf object
lnd_sf
## Simple feature collection with 1 feature and 3 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 0.1 ymin: 51.5 xmax: 0.1 ymax: 51.5
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##     name temperature       date         geometry
## 1 London          25 2017-06-21 POINT (0.1 51.5)

Coordinate reference systems (CRS)

CRSs and projections

  • Longitude and latitude identify locations on the Earth’s surface using two angles

  • Angles depend on the choice of the Earth’s shape (sphere, ellipse)

  • The shape is one part of a “coordinate reference system”: the datum

CRSs and projections

  • Lat-Long coordinates aren’t “Cartesian” so Euclidean distances don’t work

  • To put points onto a flat surface we use a projection

  • Projections can’t perfectly reproduce all aspects of the Earth’s surface, because its not a perfect sphere or ellipsoid.

  • Different projections preserve different properties, and are optimized for different regions of the Earth’s surface

  • The projection is another part of the CRS

CRSs in R

Let’s look at the CRS for the object we just created

st_crs(lnd_sf)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"

CRS’s can be specified using + an epsg code + a proj4string definition

CRSs in R

  • The epsg code refers to one well-defined coordinate reference system

  • For e.g., “4326” is standard lat-long

  • proj4string has various options and gives more flexibility

CRSs in R

Read in a lat-long dataset

x_ll <- read.csv("data/confirmed-sightings.csv")
x_ll <- st_as_sf(x_ll, coords = c("Longitude", "Latitude"), crs = 4326)
st_crs(x_ll)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"
d_ll <- st_distance(x_ll[1, ], x_ll[2, ])

Reprojecting

x_utm <- st_transform(x_ll, 
                      crs = "+proj=utm +zone=48 +datum=WGS84 +units=m +no_defs")
st_crs(x_utm)
## Coordinate Reference System:
## EPSG: 32648
## proj4string: "+proj=utm +zone=48 +datum=WGS84 +units=m
+no_defs"
d_utm <- st_distance(x_utm[1, ], x_utm[2, ])
c(d_ll, d_utm)
## Units: [m]
## [1] 229489.4 231902.2

Reading in spatial data

From a shape file

occupancies <- st_read("data/Mongolia_SL.shp", quiet = TRUE)
head(occupancies)
## Simple feature collection with 6 features and 2 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 26817.32 ymin: 5744980 xmax: 106817.3 ymax:
5804980
## epsg (SRID): 32648
## proj4string: +proj=utm +zone=48 +datum=WGS84 +units=m
+no_defs
## Team Occ geometry
## 1 A 0.20980194 POLYGON ((66817.32 5784980,...
## 2 A 0.28545357 POLYGON ((86817.32 5784980,...
## 3 A 0.25328456 POLYGON ((66817.32 5764980,...
## 4 A 0.14932811 POLYGON ((86817.32 5764980,...
## 5 A 0.28437589 POLYGON ((26817.32 5744980,...
## 6 A 0.09210202 POLYGON ((46817.32 5744980,...

plot(st_geometry(occupancies))

plot(occupancies["Team"])

plot(occupancies["Occ"])

occupancies %>% st_crs()  # check the CRS
## Coordinate Reference System:
## EPSG: 32648
## proj4string: "+proj=utm +zone=48 +datum=WGS84 +units=m
+no_defs"
occupancies <- st_read("data/Mongolia_SL.shp", quiet = TRUE) %>% 
    st_set_crs("+proj=utm +zone=48 +datum=WGS84 +units=m +no_defs")
occupancies %>% st_crs()
## Coordinate Reference System:
## EPSG: 32648
## proj4string: "+proj=utm +zone=48 +datum=WGS84 +units=m
+no_defs"

From a spreadsheet or non-spatial data frame

sights <- read.csv("data/confirmed-sightings.csv")
sights <- st_as_sf(sights, coords = c("Longitude", "Latitude"), 
    crs = 4326)
sights %>% st_crs()
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"
# reprojecting
my_crs <- "+proj=utm +zone=48 +datum=WGS84 +units=m +no_defs"
sights <-  st_transform(sights, crs = my_crs)

plot(sights["Elevation"])

Raster from file

raster_filepath <- system.file("raster/srtm.tif", package = "spDataLarge")
new_raster <- raster(raster_filepath)
plot(new_raster)

Raster from data frame

mydata <- expand.grid(x = 1:100, y = 1:100) %>% 
  mutate(z1 = x + y, z2 = x - y)
myraster <- rasterFromXYZ(mydata)
plot(myraster[["z2"]])

Raster from scratch

raster1 <- raster(nrows = 10, ncols = 10, res = 1, xmn = 0, xmx = 10, 
    ymn = 0, ymx = 10, vals = 1:100)
plot(raster1)

Multiple layer rasters

raster2 <- raster(nrows = 10, ncols = 10, res = 1, xmn = 0, xmx = 10, 
    ymn = 0, ymx = 10, vals = runif(100))
myrb <- brick(raster1, raster2)
plot(myrb)

Manipulating spatial data

Filter

dplyr verbs work with sf objects in intuitive ways

occ_gt90 <- occupancies %>% filter(Occ > 0.6)
plot(occ_gt90)

Select polygons with a sighting

Some specialised spatial functions like st_intersects

sights_per_cell <- st_intersects(x = occupancies, y = sights)
sel_logical <- lengths(sights_per_cell) > 0
occ_with_sights <- occupancies[sel_logical, ]
plot(occ_with_sights["Occ"])

Grouped summarize

occupancies %>% group_by(Team) %>% summarize(meanOcc = mean(Occ)) %>% 
    head()

Can also summarize geometries

occ_per_team <- occupancies %>% group_by(Team) %>%
  summarize(meanOcc = mean(Occ),
            geometry = st_union(geometry))
plot(occ_per_team[1:12,"meanOcc"])

Static maps

tmap

  • similar syntax to ggplot2, tailored to maps
  • shape is a spatial object with class sf, sp or raster.
map0 <- tm_shape(occupancies) + tm_fill()
map1 <- tm_shape(occupancies) + tm_borders()
tmap_arrange(map0, map1, nrow = 1)

map2 <- tm_shape(occupancies) + tm_polygons()
map3 <- tm_shape(occupancies) + tm_polygons("Occ")
tmap_arrange(map2, map3, nrow = 1)

  • Multiple layers per shape
tm_shape(occupancies) + tm_polygons("Occ") + tm_text("Team", 
    size = 0.4)

  • Multiple shapes
tm_shape(occupancies) + tm_polygons("Occ") + tm_shape(sights) + 
    tm_dots("Elevation", size = 0.5)
Color settings are especially important for maps. Here use of the same palette obscures detail.

Color settings are especially important for maps. Here use of the same palette obscures detail.

  • Sequential palettes are good for values that have natural ordering from low to high
tm_shape(occupancies) + tm_polygons("Occ") + tm_shape(sights) + 
    tm_dots("Elevation", size = 0.5, palette = "BuGn")

  • Categorical palettes are best for data with no natural ordering
tm_shape(occupancies) + tm_polygons("Occ") + tm_shape(sights) + 
    tm_dots("Mountain", size = 0.5, palette = "Set3")

  • Diverging palettes emphasise differences from some central reference point
tm_shape(occupancies) + tm_polygons("Occ") + tm_shape(sights) + 
    tm_dots("Elevation", size = 0.5, palette = "PRGn")

  • faceting
tm_shape(occupancies) +
  tm_polygons(c("Occ", "Team")) +
  tm_facets(sync = TRUE, ncol = 2)

ggplot

  • geom_sf plots sf objects
ggplot() + geom_sf(data = occupancies)

  • geom_sf plots sf objects
ggplot() + geom_sf(data = occupancies, aes(fill = Occ))

  • Can plot multiple sf objects
ggplot() + geom_sf(data = occupancies, aes(fill = Occ)) +
  geom_sf(data = sights, colour = "red")

  • scale_XXX_brewer for discrete scales, scale_XXX_distiller for continuous scales
ggplot() + geom_sf(data = occupancies, aes(fill = Occ)) +
  geom_sf(data = sights, aes(colour = Elevation)) +
  scale_colour_distiller(palette = "PRGn")

Interactive maps

package leaflet

  • provides an interface to the Leaflet JavaScript library
  • leaflet() makes a map object that can be piped to other leaflet functions
leaflet() %>% addTiles()

  • Set a starting origin and zoom level
leaflet() %>% addTiles() %>% setView(lng = 94.1, lat = 47.4, 
    zoom = 7)

  • Set a starting origin and zoom level
leaflet() %>% addTiles() %>%
  setView(lng = 94.1, lat = 47.4, zoom = 4)

  • Various base map options with addProviderTiles()
leaflet() %>% addProviderTiles("Esri.WorldStreetMap") %>%
  setView(lng = 94.1, lat = 47.4, zoom = 4)

  • Various base map options with addProviderTiles()
leaflet() %>% addProviderTiles("Esri.WorldImagery") %>%
  setView(lng = 94.1, lat = 47.4, zoom = 4)

  • sf objects can be passed as additional map layers
  • must be in lat/long format
leaflet(data = st_transform(occupancies, crs = 4326)) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addPolygons()

  • Multiple map layers
leaflet(data = st_transform(occupancies, crs = 4326)) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addPolygons() %>%
  addMarkers(data = st_transform(sights, crs = 4326))

pal <- colorNumeric("YlOrRd", domain = occupancies$Occ)
leaflet(st_transform(occupancies, crs = 4326)) %>%
  addProviderTiles("Esri.WorldImagery") %>%
  addMarkers(data = st_transform(sights, crs = 4326)) %>%
  addPolygons(color = ~pal(Occ), fillOpacity = 0.4) %>%
  addLegend(pal = pal, values = ~Occ) %>%
  addMiniMap(position = "bottomleft")

Further reading